action delay
HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller
Zhang, Qiyuan, Weng, Chenfan, Li, Guanwu, He, Fulai, Cai, Yusheng
Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framework designed to learn RL policies that perform human-like locomotion. The primary challenges of human-like locomotion are complex reward engineering and domain randomization. HiLo overcomes these issues by developing an RL-based motion tracking controller and simple domain randomization through random force injection and action delay. Within the framework of HiLo, the whole-body control problem can be decomposed into two components: One part is solved using an open-loop control method, while the residual part is addressed with RL policies. A distributional value function is also implemented to stabilize the training process by improving the estimation of cumulative rewards under perturbed dynamics. Our experiments demonstrate that the motion tracking controller trained using HiLo can perform natural and agile human-like locomotion while exhibiting resilience to external disturbances in real-world systems. Furthermore, we show that the motion patterns of humanoid robots can be adapted through the residual mechanism without fine-tuning, allowing quick adjustments to task requirements.
Sim-to-real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning
Jonnarth, Arvi, Johansson, Ola, Felsberg, Michael
Sim-to-real transfer presents a difficult challenge, where models trained in simulation are to be deployed in the real world. The distribution shift between the two settings leads to biased representations of the perceived real-world environment, and thus to suboptimal predictions. In this work, we tackle the challenge of sim-to-real transfer of reinforcement learning (RL) agents for coverage path planning (CPP). In CPP, the task is for a robot to find a path that visits every point of a confined area. Specifically, we consider the case where the environment is unknown, and the agent needs to plan the path online while mapping the environment. We bridge the sim-to-real gap through a semi-virtual environment with a simulated sensor and obstacles, while including real robot kinematics and real-time aspects. We investigate what level of fine-tuning is needed for adapting to a realistic setting, comparing to an agent trained solely in simulation. We find that a high model inference frequency is sufficient for reducing the sim-to-real gap, while fine-tuning degrades performance initially. By training the model in simulation and deploying it at a high inference frequency, we transfer state-of-the-art results from simulation to the real domain, where direct learning would take in the order of weeks with manual interaction, i.e., would be completely infeasible.
Neural Laplace Control for Continuous-time Delayed Systems
Holt, Samuel, Hรผyรผk, Alihan, Qian, Zhaozhi, Sun, Hao, van der Schaar, Mihaela
Many real-world offline reinforcement learning (RL) problems involve continuous-time environments with delays. Such environments are characterized by two distinctive features: firstly, the state x(t) is observed at irregular time intervals, and secondly, the current action a(t) only affects the future state x(t + g) with an unknown delay g > 0. A prime example of such an environment is satellite control where the communication link between earth and a satellite causes irregular observations and delays. Existing offline RL algorithms have achieved success in environments with irregularly observed states in time or known delays. However, environments involving both irregular observations in time and unknown delays remains an open and challenging problem. To this end, we propose Neural Laplace Control, a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner--and is able to learn from an offline dataset sampled with irregular time intervals from an environment that has a inherent unknown constant delay. We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.
DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning
Yuan, Tingting, Chung, Hwei-Ming, Yuan, Jie, Fu, Xiaoming
Secondly, the communication improves its policy iteratively by learning from observations delay can interfere with the cooperation between agents to achieve a given goal. RL, with a single agent to decide by introducing delays in action-making (Chen et al. 2021) the behavior of all entities, faces various challenges, such and uncertainty on the arrival time of information. Previous as scalability (Yan et al. 2021) and privacy issues (Yuan, work (Kim et al. 2019) prevents endless waiting by setting Chung, and Fu 2022). To this end, the extension from singleagent a predefined and constant bound for the waiting time, but it RL to multi-agent RL (MARL) (Hernandez-Leal, Kartal, may restrain potential cooperation if it is set too short and and Taylor 2019) is favorable. MARL (Hernandez-Leal, conversely may cause meaningless waiting. Therefore, such Kartal, and Taylor 2019) has been widely used in various a constant timer is inflexible and cannot be adapted to the tasks, such as real-time resource allocation (Yuan et al. dynamics in the communication networks.
Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays
Nath, Somjit, Baranwal, Mayank, Khadilkar, Harshad
Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.
Delay-Aware Multi-Agent Reinforcement Learning for Cooperative and Competitive Environments
Chen, Baiming, Xu, Mengdi, Liu, Zuxin, Li, Liang, Zhao, Ding
Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training issue of multi-agent tasks with model-free deep reinforcement learning. We formally define the Delay-Aware Markov Game that incorporates the delays of all agents in the environment. To solve Delay-Aware Markov Games, we apply centralized training and decentralized execution that allows agents to use extra information to ease the non-stationarity issue of the multi-agent systems during training, without the need of a centralized controller during execution. Experiments are conducted in multi-agent particle environments including cooperative communication, cooperative navigation, and competitive experiments. We also test the proposed algorithm in traffic scenarios that require coordination of all autonomous vehicles to show the practical value of delay-awareness. Results show that the proposed delay-aware multi-agent reinforcement learning algorithm greatly alleviates the performance degradation introduced by delay. Codes and demo videos are available at: https://github.com/baimingc/delay-aware-MARL.
Delay-Aware Model-Based Reinforcement Learning for Continuous Control
Chen, Baiming, Xu, Mengdi, Li, Liang, Zhao, Ding
Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: https://github.com/baimingc/dambrl.
At Human Speed: Deep Reinforcement Learning with Action Delay
Firoiu, Vlad, Ju, Tina, Tenenbaum, Josh
There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of tasks, from video games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning and reinforcement learning, that learn to play from experience with minimal prior knowledge. However, these machines often do not win through intelligence alone -- they possess vastly superior speed and precision, allowing them to act in ways a human never could. To level the playing field, we restrict the machine's reaction time to a human level, and find that standard deep reinforcement learning methods quickly drop in performance. We propose a solution to the action delay problem inspired by human perception -- to endow agents with a neural predictive model of the environment which "undoes" the delay inherent in their environment -- and demonstrate its efficacy against professional players in Super Smash Bros. Melee, a popular console fighting game.
Pervasive Model Adaptation: The Integration of Planning and Information Gathering in Dynamic Production Systems
Liu, Juan (PARC) | Kuhn, Lukas (PARC) | Kleer, Johan de (PARC) | Zhou, Rong (PARC)
Model-based planning often presumes a static system model, while in a practice physical system may evolve or drift over time. This paper proposes the idea of pervasive model adaptation in a production system, where the model is dynamically updated using observation of production output. The core idea is the interplay between model adaptation and production planning. We seek plans which simultaneously serve the goals of achieving high productivity for production, and information gathering for model adaptation. We use a modular printing example to illustrate issues such as formulation of the information criterion and search strategy for informative plans. The idea of pervasive adaptation can be further extended to improve long term productivity in production systems.